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Abstract: The theory of stochastic vector quantisers (SVQ) has been extended 
to allow the quantiser to develop invariances, so that only "large" degrees of 
freedom in the input vector are represented in the code. This has been applied 
to the problem of encoding data vectors which are a superposition of a "large" 
jammer and a "small" signal, so that only the jammer is represented in the 
code. This allows the jammer to be subtracted from the total input vector 
(i.e. the jammer is nulled), leaving a residual that contains only the underlying 
signal. The main advantage of this approach to jammer nulling is that little prior 
knowledge of the jammer is assumed, because these properties are automatically 
discovered by the SVQ as it is trained on examples of input vectors. 

1 Introduction 

In vector quantisation a code book is used to encode each input vector as a 
corresponding code index, which is then decoded (again, using the codebook) 
to produce an approximate reconstruction of the original input vector [2] . 
The standard approach to vector quantiser (VQ) design p] may be generahsed 
0] so that each input vector is encoded as a vector of code indices that are 
stochastically sampled from a probability distribution that depends on the input 
vector, rather than as a single code index that is the deterministic outcome of 
finding which entry in a code book is closest to the input vector. This will be 
called a stochastic VQ (SVQ), and it includes the standard VQ as a special case. 

One advantage of using the stochastic approach is that it automates the 
process of splitting high-dimensional input vectors into low-dimensional blocks 
before encoding them, because minimising the mean Euclidean reconstruction 
error can encourage different stochastically sampled code indices to become 
associated with different input subspaces Another advantage is that 
it is very easy to connect SVQs together, by using the vector of code index 
probabilities computed by one SVQ as the input vector to another SVQ [J]. 

SVQ theory will be extended to the case of encoding noisy (or distorted) 
data, with the intention of subsequently reconstructing an approximation to 
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the noiseless data. This theory is then appUed to the problem of encoding data 
vectors which are a superposition of a "large" jammer and a "small" signal, 
where the signal is regarded as a distortion superimposed on the jammer, rather 
than the other way around. The reconstruction is then an approximation to the 
jammer, which can thus be subtracted from the original data to reveal the 
underlying signal of interest. 

In Section 12 the underlying theory of SVQs is developed together with its 
extension to the encoding of noisy data, and in Section some simulations 
illustrating the appHcation of SVQs to the nulling of jammers are presented. 

2 Stochastic Vector Quantiser Theory 

In Section ITTI the basic theory of folded Markov chains (FMC) is given jS], in 
Section \T2\ FMC theory is extended to the case of encoding noisy or distorted 
data with the intention of eventually recovering the undistorted data, in Section 
I2.;^l this extended theory is applied to the problem of encoding data that con- 
tain unwanted "nuisance degrees of freedom", in Section ITil some constraints 
(including the threshold trick of 0) on the optimisation of the encoder are 
introduced to encourage the encoder to disregard the nuisance degrees of free- 
dom (i.e. discover invariances) , and finally in Section l2?5l this invariant encoder 
theory is applied to the problem of encoding and subsequently nulling "large" 
jammers that obscure "small" signals. 

2.1 Folded Markov Chains 

The basic building block of the SVQ used in this paper is the folded Markov 
chain (FMC) 0- An input vector x is encoded as a code index vector y, which 
is then subsequently decoded as a reconstruction x' of the input vector. Both 
the encoding and decoding operations are allowed to be probabilistic, in the 
sense that y is a sample drawn from Pr(y|a::), and x' is a sample drawn from 
Pv{x'\y), where Pr{y\x) and Pr(a;'|y) are Bayes' inverses of each other, as given 
by Pr(x'|?/) = J- JzPrfa|r)''pr(z) ' ^^'^ Pr(x) is the prior probability from which x 
is sampled. 

In order to ensure that the FMC encodes the input vector optimally, a mea- 
sure of the reconstruction error must be minimised. There are many possible 
ways to define this measure, but one that is consistent with many previous 
results, and which also leads to many new results, is the mean EucHdean recon- 
struction error measure D, which is defined as 

M M M 

D = I dxVT:{x) ^ XI XI / dx'VT:{x\y)\\x-x'\f (1) 

where y — {yi,y2, ■ ■ ■ : Vn), 1 < yi < M is assumed, Pr(x) Pr(y|a;) Pr{x'\y) is the 
joint probability that the FMC has state {x,y,x'), \\x — x'lp is the Euclidean 
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Figure 1: A folded Markov chain (FMC) in which an input vector x is encoded 
as a code index vector y that is drawn from a conditional probability Pi{y\x), 
which is then decoded as a reconstruction vector x' drawn from the Bayes' 
inverse conditional probability Pr(a;'|?/). 

M M M 

reconstruction error, and J dx J2 J2 ' ' ' J2 I dx'{- ■ ■ ) sums over all possible 

yi=iy2=i ^71=1 
states of the FMC (weighted by the joint probability). 

The Bayes' inverse probability Pr(a;'|y) may be integrated out of this expres- 
sion for D to yield (Bj 

. M M M 

D = 2 dxV,{x)Y^ Y^---Y^V,{y\x)\\x~x'{v)\\' (2) 

1/1=1 !/2=i a„=i 

where the reconstruction vector x'{y) is defined as x'{y) = J dxPY{x\y)x. Be- 
cause of the quadratic form of the objective function, it turns out that x'{y) 
may be treated as a free parameter whose optimum value (i.e. the solution of 
Q§T^ = 0) is / dxPT{x\y)x, as required. 

2.2 Noisy Data 

The FMC approach can be generalised to the problem of encoding noisy or dis- 
torted data, with the intention of eventually recovering the undistorted data. 
This generaHsation is based on the results reported in pHI- The input vector is 
Xo, which is converted into the distorted input vector a; by a distortion process 
Pr(a;|xo), which is then encoded as a code index vector y, which is then sub- 
sequently decoded as a reconstruction Xq of the original input vector. This is 
described by the directed graph xq — > x — > y — > x'q. The operations that 
occur are summarised in Figure El 
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Figure 2: A folded Markov chain (FMC) in which an input vector Xq is first 
distorted into x, which is then encoded as a code index vector y that is drawn 
from a conditional probability Pr{y\x), which is then decoded as a reconstruction 
vector x'q drawn from the Bayes' inverse conditional probability Pr(a;Q|?;). 



The mean Euclidean reconstruction error measure D becomes (compare 
Equation^ 



M M M 

I ii2 



Xq~Xq\ 



D = J dxoPiixo) J dxPiix\xo) P^(yl^) J dx'Piix'oly) 

»/l = l»/2 = l S/n = l 

(3) 

The Bayes' inverse probability Pr(xg|?/) may be integrated out of this expression 
for D to yield (compare Equation |2l 



M M M 



D^2 I dxoVr{xo) j dxVY{x\xo) J2 J2 ' ' ' J2 Pr(y|x) | |xo - |' (4) 



where the reconstruction vector x'Q{y) is defined as x'Q{y) = J dxoFT{xo\y)xo, 
which may be treated as a free parameter. 

Bayes' theorem Pr(a;o) Pr(a;|a;o) = Pr(x) Pr(a;o|a;) may be used to integrate 
out xq to yield 



M M M 



D = 2 dxFi{x) X] XI Pr(2/N)l|a;o(a;)-4(y)||^ + constant (5) 

^. — 1 — 1 ^. — 1 



where xo{x) is defined as xq{x) = J dxoPr{xQ\x)xo. 

It is much more difficult to optimise this version of the objective function 
than the version in Equation |2l because the xq{x) term is in general a non-linear 
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function of x. Worse still, the expression for Xq{x) involves Pr(xo|x), which 
depends on the unknown Pr(a;o), so xq{x) cannot be computed analytically 
anyway. The situation looks irretrievable, but it turns out that some progress 
can be made by conceptually splitting x into "signal" and "noise" subspaces, as 
will be shown in Section 

2.3 Nuisance Degrees of Freedom 

For convenience, split up the input space into (possibly non-orthogonal) sub- 
spaces as (a:oj where all of the distortion is contained in which requires 
that any distortion that Hes in the Xq subspace is regarded as part of the undis- 
torted input. The directed graph becomes (xqjO) — > {xo,x^) — > y — > x'^ as 
shown in Figure El 
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Figure 3: A folded Markov chain (FMC) in which an input vector (xq, 0) is first 
distorted into (a;o,a;j_), which is then encoded as a code index vector y that is 
drawn from a conditional probability Pr(y|a;o, which is then decoded as a 
reconstruction vector x'q drawn from the Bayes' inverse conditional probability 
PrKly). 

The expression for D becomes (compare Equation 

/. M M M 

dxoPr(a;o) / dx^Vv{x±_\xQ) ^ X! " ' X! ^''(^1^0, 1 1 {y)\\ 
1/1=1 y2=i a„=i 

(6) 

Consider the related optimisation problem in which an attempt to to reconstruct 
{xq,x±) is made, as shown in Figure 0| The corresponding objective function 
may be obtained by modifying Equation where the cross-term arising from 
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Figure 4: Modified version of Figure El in which the reconstruction link is 
switched from the original undistorted signal to the full signal+distortion. 

non-orthogonal {xo,x±) is omitted. 

M M M 

D = 2/(ia;oPr(xo)/dx±Pr(a;^|a;o) ' ' ' ^'^{y\xo,x±) 

!;i = i!;2 = i Vn=i (7) 

X {\\x,-x',{y)f + \\x^^x'M\') 

Assume for now (to be justified below) that some of the links in Figure 31 are 
broken as shown in Figure Because the distortion subspace is not involved 
in the computations in Figure it may be redrawn as shown in Figure El This 
is the same as Figure O except that the encoder now disregards (or is invariant 
with respect to) the nuisance degrees of freedom. 

In order to break the links as shown in Figure the following argument is 
required: 

1. Assume that the encoder is independent of x^, so that Pr(j/|a::o, xj_) = 
Pr(y|a;o)- 

2. The ||a;j^ — a;'^(t/)|p term in D needs to simplify to a constant. 

M M M 

3. This requires that / dxo Pr(a;o) / fix _L Pr(a;_L|a;o) Z S ''■ Z Pr(?/|a;o)||a;_L — 

yi=ij/2=i s/ti=i 

a;'j_(?/)|p = constant. 

4. To guarantee this constant, it is sufficient to have / dxj. Pr(xj_|a:o)||a;^ — 

(y) I P — constant independent of xq and y. 

5. To guarantee this constant independent of and ?/, it is sufficient to have 

Pr(a;_L|xo) = Pr(a;_L). 
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Figure 5: Modified version of Figure^in which the encoder (and reconstruction) 
links from (and to) the distortion subspace are deleted (as indicated by the 
dashed lines). 



6. Given that Pr(x^ l^o) = Pr(a;j^) and Pr(?/|a;o, x^) = Pr(?/|a;o), then making 
the replacement x'j^{y) — >< x± > in D will give an objective function 
with the same stationary points as D, because x'^{y) =< a;_L > is the 
stationary point of D with respect to x'^{y). 

7. Given that Pr(a;j^|a::o) = Pr(a::_L) and a;^(j/) =<x±^ >,theresult j dx ji^'Pr{x x_\xii)\\x ^ — 
^±(2/) IP — constant independent of andy follows automatically. 

The assumptions may be summarised as 

VT{y\xo,Xi_) =Pr(y|a;o) 
¥v{xi_\xo) =Pr(a;_L) 

which allow the objective function D (see Equation CJ to be replaced by the 
equivalent objective function 

/M M M 
cia;oPr(xo) ^ X! " ' X! P'^(yNo) ||a;o - a:o(y)||^ + constant (9) 

This is the standard FMC objective function (compare Equation for encod- 
ing and reconstructing the undistorted input, for which the directed graph is 
Xq — > y — > x'q. Note that, under the stated assumptions, the simplification in 
Equation El occurs even if the two subspaces are not orthogonal to each other, 
the potential cross-term / dx± Ft{x±\xq){xo — x'Q{y)).{x± — x'^{y)) in Equation 
His zero. 
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Figure 6: Equivalent version of Figure El in which the reconstruction link is 
moved to an equivalent position. 

In summary, the encoder has access only to the signal + distortion (xq, x±) 
(see Figure 01 and Equation Cjl , but the assumptions in Equation (HI force the 
encoder to disregard the distortion (see Figure Eland Equation Ell . In practice, 
it is not possible to satisfy these assumptions in general, because it is not known 
in advance how to extract orthogonal signal and distortion subspaces {xq,x±) 
given examples of only the distorted signal. However, these assumptions may 
be encouraged to hold true by minimising D (as defined in Equation El under 
certain constraints, in which case Figure El and Equation El follow automatically 
from Figure 21 and Equation respectively. These constraints are discussed in 
Section O 

This type of encoder, in which the large degrees of freedom are preferentially 
encoded, can be used as the basis of a so-called "residual vector quantiser" jll) . 
in which (quoting from Ql]) "the quantiser has a sequence of encoding stages, 
where each stage encodes the residual (error) vector of the prior stage". Note 
that a residual vector quantiser is a special case of the type of multistage encoder 
discussed in [7 . 

2.4 Optimisation Constraints 

Henceforth, only the scalar case will be considered, so the vector y is now 
replaced by the scalar y (1 < y < M). In order to implement a practical 
optimisation procedure for minimising D it is necessary to introduce a variety 
of assumptions and constraints. 

M 

Because Fr{y\x) is a probability it satisfies Pr(y|a;) > and ^ Pr(y|x) = 1, 
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which is guaranteed if PT{y\x) is written as 



„ . I X Q{y\x) ,.^s 

Pnyk) = —J (10) 

y' = l 

where Q{y\x) > 0. This removes the need to explicitly impose the constraint 

M 

Pr(y|x) during optimisation. The Q{y\x) are the unnormalised likelihoods 

of sampHng code index y from the code book. 

However, Q{y\x) itself needs to be described by a finite number of parameters 
in order that the values that minimise Di + D2 may be derived from a finite 
amount of training data. It can be shown that the optimal form of Pr{j/|a:) is 
piecewise Hnear in x 0, and that for training data that He on smooth curved 
manifolds the form of this solution is well approximated by a piecewise linear 
Q{y\x) of the form [S] 

n(.,\T\- i ^^y^-'^'^'^y^ w{y)-x>a{y) 

^^y^"^' ~\ w{y).x < a{y) ^^^^ 

which is the same as the functional form used for the neural response in [2]. 
However, the precise functional form of Q{y\x) needs to exhibit this behaviour 
only in the vicinity of the data manifold, so in particular it can be allowed to 
saturate (i.e. Q{y\x) — > 1) as w(y).x — > 00. A convenient functional form 
that achieves this is the sigmoid, which is defined as 

Q{y\x) = z 7 r^TT (12) 

^ l + exp{-w{y).x-b{y)) ^ ' 

This reduces the problem of minimising D to one of finding the optimal values 
of the w{y), b{y) and x'{y). This may be done by using the gradient descent 
procedure described in 

If the input is an undistorted signal (i.e. x = (xq, 0)) which lies on a smooth 
curved manifold, then the sigmoids can cooperate in encoding this input as 
illustrated in Figure Q where the sigmoid threshold planes w{y).x + b{y) — 
are shown slicing pieces off the curved manifold p]. 

The additional constraints that are required in order to implement the be- 
haviour described in Section 12.^^1 will now be described. Thus the constraints 
must be such that the encoder disregards (or is invariant with respect to) the 
nuisance degrees of freedom x± in the full input vector (xq, x±). However, with- 
out knowing Pr(a;o,a;_L) in advance (which would allow ^0(2;) in Equation to 
be calculated), it is not possible to give a general approach that works in all 
cases. At best, an empirical approach must be used. 

A very simple and useful constraint is to impose a threshold constraint on 
the sigmoid function, which forces the value of the sigmoid to lie exactly halfway 
up its slope when the norm of its input vector is 9. This is achieved by choosing 



9 



Figure 7: Illustration of how a number of sigmoids can cooperate to slice pieces 
off a signal manifold. 



9\w{y)\, so that 



Q{y\^) 



1 



(13) 



1 + exp(- {w{y).x - 9) ||w(y)||) 



where \ \w(y)\ \ = ^w{y).w{y) and w{y) = wZfy^y 

If the input is a distorted signSiX (i.e. x — (xq, x^)) which lies on a "thickened" 
version of the smooth curved manifold of Figured (the thickness represents the 
nuisance degrees of freedom) , then the sigmoids can cooperate in encoding this 
input as illustrated in FigureEl where the sigmoid threshold planes w{y).x = 9 
are shown slicing pieces off the curved manifold in a way that disregards the 
nuisance degrees of freedom. Note that in Figure |H1 the representation of thick- 
ening is not complete, because it can actually occur in any direction orthogonal 
to the manifold, including directions orthogonal to the space in which the man- 
ifold is embedded; the radial direction in Figure |H1 does not include this latter 
possibility. 

In practice, for numerical efficiency and to encourage the optimisation pro- 
cedure to locate the global minimum of Di + D2, it is useful to introduce 
two additional constraints. Firstly, because optimal solutions typically satisfy 
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Figure 8: Illustration of how a number of sigmoids can cooperate to slice pieces 
off a signal manifold thickened by nuisance degrees of freedom. 

x'{y) « w{y) up to a multiplicative constant, each reconstruction vector x'{y) 
can be forced to lie parallel to the corresponding weight vector so that 

x'{y) (xw^y); this constraint was also used in ^2], but there it was a necessary 
part of the optimisation procedure, whereas here it merely encourages faster 
convergence. Secondly, the norm of the weight vectors can be con- 

strained as I |it;(2/)| I = Wo, in order to avoid situations where they grow to rather 
large values which make Q{y\x) (and hence Pr{y\x)) depend very strongly on x 
in some regions. Both of these constraints speed up convergence to the global 
minimum of Di+ D2, can finally be lifted in the vicinity of an optimal solution 
to obtain complete convergence. 

2.5 Jammer Nulling 

A number of examples of typical behaviours of Pr{y\x) are shown in Figure|2| 

In FigureElthe signal and jammer degrees of freedom generate a pair of non- 
orthogonal subspaces, whose axes are indicated in bold. The response contours 
of a variety of possible Pr(y|a;) are shown. In the "full" case a pair of Pr{y\x) 
respond to the signal and jammer subspaces respectively. In the "signal" case 
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Figure 9: Examples of the response of Pr(y|a;) to signal and jammer subspaces. 



a Pr(y|a;) responds to only the signal subspace, and is thus invariant over the 
jammer subspace. In the "jammer" case the situation is the reverse of the 
"signal" case. This argument may readily be generalised to any number of 
Pr{y\x). 

If it is assumed that the jammer is the "large" degree of freedom and the 
signal is the "small" degree of freedom, the signal and jammer subspaces may 
be separated by adjusting the threshold parameter 9 so that in figure XXX the 
"jammer" case is obtained, in which case the Pr{y\x) for y = 1,2, ■ ■ ■ , M will 
all become invariant over the signal subspace. The jammer subspace is then 
spanned by the set of gradient vectors V Pr(j/|a;) for y = 1, 2, • • • , M, which can 
thus be used to construct a projection operator J onto the jammer subspace, 
and a projection operator 1 — J onto the signal subspace. This definition of 
the projection operator may also be used in cases where the jammer and signal 
subspaces are curved, so that the directions of their axes are functions of x, and 
all of the straight lines in Figure El are replaced by curves defining a curvilinear 
coordinate system and its coordinate surfaces. Note that curved subspaces are 
the norm rather than the exception. 



3 Jammer Nulling Simulations 

The optimisation of the encoder may be done by minimising D using gradient 
descent yt|, using the sigmoid function in Equation El to constrain the optimi- 
sation so that it encodes only the jammer subspace. 

In these simulations the input vector x is 100-dimensional so that x = 
{xi, X2, ■ ■ ■ , 2^100 ), and each vector in the training set is independently generated 
as a superposition of a pair of response functions 

sin(^) sin(i^) , ^ 

where Ug is the signal amplitude that is uniformly distributed in the interval 
[— vlCP^, vKP^] (this correponds to a signal level of -30dB), aj is the jammer 
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amplitude that is uniformly distributed in the interval [—1,1] (this correponds to 
a jammer level of OdB), ig is the signal location that is chosen to be 50, ij is the 
jammer location that is uniformly distributed in the interval [38 — A,38 + A] 
(A = 0,2,4 is used in the simulations), and a is the width of the response 
function that is chosen to be 2. The peak and the first zero of the sine function 
are separated by ttct, which defines the resolution cell size. The mean jammer 
position and the signal position satisfy ig— < ij >— 12, which corresponds to 
a separation of -^2- « 2 resolution cells. Random noise uniformly distributed in 
the interval [— \/lO~^, VlO~^] (this correponds to a noise level of -50dB) is also 
added to each component of the training vector. 




-0.2 -0.1 0.1 0.2 



bin at position 49 



Figure 10: A two-dimensional projection of the curved manifold generated by 
the jammer when A = 2. 

In Figure [Tol the 2-dimensional manifold generated by varying the jammer 
position over the interval [38 — A, 38 + A] (for A = 2), and varying the jammer 
amplitude over the interval [—1,1], is shown. Because the input vector x is 
100-dimensional, only a low-dimensional projection can be visualised, and the 
2-dimensional vector (X49, a;5i) is displayed here. The curviHnear grid traces out 
the coordinate surfaces of jammer position ij and jammer amplitude aj, and 
the whole diagram shows how this grid is embedded in (2:49, a;5i)-space. Note 
that the Uj dimension behaves as a "radial" coordinate (straight lines), whereas 
the ij dimension behaves as an "angular" coordinate (curved lines). 

In Figure ITTl an encoder is trained on three different jammer scenarios A = 
0, 2, 4. After training the encoder is tested for how well it can be used to 
null a pure jammer (i.e. with no signal or noise added), where the degree 
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Figure 11: Plot of degree of nulling against nominal jammer location, for jammer 
locations that are spread over the intervals [38,38] using M = 2, [36,40] using 
M = 4, and [34, 42] using M = 6. 

of nuUing is defined as the ratio of the squared lengths of the nulled input 
vector and the original input vector. This is a good test of the ability of the 
encoder to simultaneously learn the profile of the jammer and the shape of the 
jammer manifold which is generated by sweeping this profile over the interval 
[38 — A, 38 + A] . When A = there is a sharp minimum at the jammer location 
ij = 38, as expected. When A = 2 the minimum becomes spread over the 
jammer locations ij £ [36,40], and when A = 4 the minimum becomes spread 
even more broadly over the jammer locations ij G [34,42]. All of these results 
are as expected. 




2040608100 2040608100 2040608100 
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Figure 12: Plot of a typical input vector before and after jammer nulling for 
each of the scenarios in Figure ITTl 

In FigureEltypical examples of an input vector together with how it appears 
after jammer nulling are shown for each of the jammer scenarios considered in 
Figureim In every case the signal is clearly revealed at its correct location after 
nulling the jammer. 

In all of these training scenarios, one could envisage further constraining 
some of the properties of the encoder, in order to introduce prior knowledge 
of the form of the jammer and/or signal subspaces, and to thereby reduce the 
computational complexity of the jammer nulling. For instance, the signal sub- 
space could be predefined, as in conventional algorithms which hold constant 
the response in a predefined "look direction". Similarly, the jammer subspace 
could be built out of prefined subspaces which are optimised so as to maximally 
null the jammer(s), as in conventional algorithms in which a number of jammer 
"templates" are used to remove the jammer(s). In general, by choosing appro- 
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priate additional constraints, the SVQ approach to jammer nulling can be made 
backwardly compatible with conventional approaches. 

4 Conclusions 

The theory of stochastic vector quantisers (SVQ) jU has been extended to allow 
the quantiser to develop invariances, so that only "large" degrees of freedom 
in the input vector are represented in the code. This has been applied to the 
problem of encoding data vectors which are a superposition of a "large" jammer 
and a "small" signal, so that only the jammer is represented in the code. This 
allows the jammer to be subtracted from the total input vector (i.e. the jammer 
is nulled), leaving a residual that contains only the underlying signal. Several 
numerical simulations have shown how that idea works in practice, even when 
the jammer location is uncertain so that the jammer subspace is curved. 

The main advantage of this approach to jammer nulling is that little prior 
knowledge of the jammer is assumed, because these properties are automatically 
discovered by the SVQ as it is trained on examples of input vectors. Provided 
that the signal is much weaker than the jammer, the SVQ acquires an internal 
representation of the jammer and signal manifolds, in which its code is invariant 
with respect to the signal. In a sense, the SVQ regards the "large" jammer as the 
normal type of input that it expects to receive, whereas it regards the "small" 
signal as an anomaly. 
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